Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lychee link checker from built website version #855

Merged
merged 39 commits into from
Mar 8, 2024

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Feb 12, 2024

No description provided.

@barjin barjin added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 12, 2024
@barjin barjin self-assigned this Feb 12, 2024
@github-actions github-actions bot added this to the 83rd sprint - Tooling team milestone Feb 12, 2024
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 12, 2024
add docs.apify.com/assets to ignored lychee links
@TC-MO
Copy link
Contributor

TC-MO commented Feb 20, 2024

I did a bit of testing here to figure out the # of errors, I added any github edit links and assets links to ignored, which substantially lowered the # of errors.

What we are left is over 200 errors for docs.apify.com/api/v2/ since lychee is testing post build it checks the navbar links on every page now, and it seems that we have a links with trailing slash in the navbar, tho it throws errors only for API reference while docs.apify.com/api/client/js/ also has trailing slash but no error from what I can see

So the only issues I see with this approach is

  • It can only start once the build is done thus extending by few minutes the check (very small issue)
  • it check for a lot of unnecessary stuff like the navbar on every single page, github edits, assets (but all those can be alleviated with proper ignores in .lycheeignore file

Seems like this might be the best approach all in all

@barjin
Copy link
Contributor Author

barjin commented Feb 22, 2024

The difference between the api/v2/ and api/client/js/ might be that the latter is an index page of the entire JS client project, whereas the former is a single page in this (apify-docs) repository. The fix might be as simple as removing the slash from here, or creating an Nginx rewrite rule.

Are there any other issues directly with this PR, or can we fix the found broken links and merge this? Looking at the results, it looks quite helpful :)

@TC-MO
Copy link
Contributor

TC-MO commented Feb 22, 2024

Yes, I think we should move forward with this PR and with fixing api/v2/ links.

Apart from that I saw some file errors I'll try to figure out what those are, and npmjs I think also rate limited us similar to github, if this will persist maybe I'll add it to the exceptions

add <https:\/\/> to ignored by lychee
@TC-MO
Copy link
Contributor

TC-MO commented Feb 22, 2024

@barjin I'm moving along with fixing whatever I can for broken links, but it seems that lychee is not filtering out node_modules and checks those for broken links as well
obraz

And as for errors with npmjs I think it the only way is to either add them to ignore like with github edit links or accept 429 as ok response.

@barjin
Copy link
Contributor Author

barjin commented Feb 22, 2024

Regarding the broken links, I suppose it's just wrong arguments, right? Maybe like this it will work?

--base https://docs.apify.com --verbose --no-progress 'build/**/*.html'

Ignoring npm is probably fine, we don't link too many external projects in our docs... and completely unpublishing something from the npm registries is quite hard, so not much worries there

fix broken external links to langchain & llamaindex
change link from relative to absolute for test purposes
@TC-MO
Copy link
Contributor

TC-MO commented Feb 26, 2024

  • as for using build/**/*.html this might not be enough as lychee is also checking this

obraz

apart from that there are links that are working within IDE & live docs , but are considered broken in Lychee, not sure if this is due to caching?

obraz
obraz

@TC-MO
Copy link
Contributor

TC-MO commented Feb 29, 2024

@barjin

I think we may be at end here and soon ready for merging.

  • I added --exclude-path to filter out .html files from build/versions & node_modules it worked for the build/version but not for node_modules could you take a look and double check, maybe I messed up somewhere?

  • fixed almost all broken link apart from our API reference. We have those two sets of links that are working on live docs, and working within IDE but for some reason Lychee still treats them as errors. I would rather not add them as exception since they are our internal links but maybe we can defer to Docusaurus here?

obraz

@B4nan
Copy link
Member

B4nan commented Feb 29, 2024

Jindra is out on vacation btw, he'll be back next wednesday. I can try to look into this myself.

@TC-MO
Copy link
Contributor

TC-MO commented Feb 29, 2024

ah did not notice, my apologies it would be much appreciated @B4nan

@barjin
Copy link
Contributor Author

barjin commented Mar 6, 2024

I just pushed fixes to the issues mentioned in your last message @TC-MO , feel free to check them out.

The only remaining thing now seems to be some forgotten Google rate-limiting and occasionally flaky Apify server - while the first can be fixed with yet another ignore rule, the second one should be fixed... perhaps with some retries? I didn't find much in the Lychee docs, maybe you know more @TC-MO as you've used this before?

obrazek

@TC-MO
Copy link
Contributor

TC-MO commented Mar 6, 2024

Thanks @barjin ! That looks extremely promising!

  • Regarding retries, per Lychee docs we can set them up using --max-retries x argument with x being the number of retries we want to try.

  • Regarding the google sheet, I don't think it is rate limiting us, from what I see we are linking to a google spreadsheet that is with restricted access will need to get to the bottom of it first thing tomorrow

@barjin barjin merged commit 3367c15 into master Mar 8, 2024
8 checks passed
@barjin barjin deleted the feat/lychee-link-checker branch March 8, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants